Search CORE

13 research outputs found

Seeking unique and common biological themes in multiple gene lists or datasets: pathway pattern extraction pipeline for pathway-level comparative analysis

Author: Anney Che
Ming Yi
Robert M Stephens
Uma Mudunuri
Yi Ming
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background One of the challenges in the analysis of microarray data is to integrate and compare the selected (e.g., differential) gene lists from multiple experiments for common or unique underlying biological themes. A common way to approach this problem is to extract common genes from these gene lists and then subject these genes to enrichment analysis to reveal the underlying biology. However, the capacity of this approach is largely restricted by the limited number of common genes shared by datasets from multiple experiments, which could be caused by the complexity of the biological system itself. Results We now introduce a new Pathway Pattern Extraction Pipeline (PPEP), which extends the existing WPS application by providing a new pathway-level comparative analysis scheme. To facilitate comparing and correlating results from different studies and sources, PPEP contains new interfaces that allow evaluation of the pathway-level enrichment patterns across multiple gene lists. As an exploratory tool, this analysis pipeline may help reveal the underlying biological themes at both the pathway and gene levels. The analysis scheme provided by PPEP begins with multiple gene lists, which may be derived from different studies in terms of the biological contexts, applied technologies, or methodologies. These lists are then subjected to pathway-level comparative analysis for extraction of pathway-level patterns. This analysis pipeline helps to explore the commonality or uniqueness of these lists at the level of pathways or biological processes from different but relevant biological systems using a combination of statistical enrichment measurements, pathway-level pattern extraction, and graphical display of the relationships of genes and their associated pathways as Gene-Term Association Networks (GTANs) within the WPS platform. As a proof of concept, we have used the new method to analyze many datasets from our collaborators as well as some public microarray datasets. Conclusion This tool provides a new pathway-level analysis scheme for integrative and comparative analysis of data derived from different but relevant systems. The tool is freely available as a Pathway Pattern Extraction Pipeline implemented in our existing software package WPS, which can be obtained at <url>http://www.abcc.ncifcrf.gov/wps/wps_index.php</url></p

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Knowledge and Theme Discovery across Very Large Biological Data Sets Using Distributed Queries: A Prototype Combining Unstructured and Structured Data

Author: Anney Che (491890)
Brian T. Luke (171128)
F. Pascal Girard (491891)
Girish Venkataraman (491889)
Mohamad Khouja (491887)
Robert M. Stephens (17045)
Stephen Repetski (491888)
Uma S. Mudunuri (463683)
Publication venue
Publication date: 02/12/2013
Field of study

<div>As the discipline of biomedical science continues to apply new technologies capable of producing unprecedented volumes of noisy and complex biological data, it has become evident that available methods for deriving meaningful information from such data are simply not keeping pace. In order to achieve useful results, researchers require methods that consolidate, store and query combinations of structured and unstructured data sets efficiently and effectively. As we move towards personalized medicine, the need to combine unstructured data, such as medical literature, with large amounts of highly structured and high-throughput data such as human variation or expression data from very large cohorts, is especially urgent. For our study, we investigated a likely biomedical query using the Hadoop framework. We ran queries using native MapReduce tools we developed as well as other open source and proprietary tools. Our results suggest that the available technologies within the Big Data domain can reduce the time and effort needed to utilize and apply distributed queries over large datasets in practical clinical applications in the life sciences domain. The methodologies and technologies discussed in this paper set the stage for a more detailed evaluation that investigates how various data structures and data models are best mapped to the proper computational framework.</div

Directory of Open Access Journals

PubMed Central

FigShare

Growth of articles in MEDLINE.

Author: Anney Che (491890)
Brian T. Luke (171128)
F. Pascal Girard (491891)
Girish Venkataraman (491889)
Mohamad Khouja (491887)
Robert M. Stephens (17045)
Stephen Repetski (491888)
Uma S. Mudunuri (463683)
Publication venue
Publication date
Field of study

A bar chart displaying the number of baseline records in NLM MEDLINE’s 2001 baseline release to 2012 baseline release. (<a href="http://www.nlm.nih.gov/bsd/licensee/2012_stats/baseline_doc.html" target="_blank">http://www.nlm.nih.gov/bsd/licensee/2012_stats/baseline_doc.html</a>).</p

FigShare

Network of Cancer-Gene associations from literature.

Author: Anney Che (491890)
Brian T. Luke (171128)
F. Pascal Girard (491891)
Girish Venkataraman (491889)
Mohamad Khouja (491887)
Robert M. Stephens (17045)
Stephen Repetski (491888)
Uma S. Mudunuri (463683)
Publication venue
Publication date
Field of study

Network of Cancer/Gene associations displaying shared genes between cancers and genes specific to certain cancer types based on literature evidence. Cancer terms are represented as labeled nodes, genes are unlabeled pink nodes and the edges represent at least one publication with a co-occurrence of the cancer term and gene.</p

FigShare

Load and query times using simulated gene expression data.

Author: Anney Che (491890)
Brian T. Luke (171128)
F. Pascal Girard (491891)
Girish Venkataraman (491889)
Mohamad Khouja (491887)
Robert M. Stephens (17045)
Stephen Repetski (491888)
Uma S. Mudunuri (463683)
Publication venue
Publication date
Field of study

* Query to get the DEG list was not run on the 8TB data due to time constraints.</p

FigShare

Architecture for integrating structured and unstructured data in Hadoop.

Author: Anney Che (491890)
Brian T. Luke (171128)
F. Pascal Girard (491891)
Girish Venkataraman (491889)
Mohamad Khouja (491887)
Robert M. Stephens (17045)
Stephen Repetski (491888)
Uma S. Mudunuri (463683)
Publication venue
Publication date
Field of study

Architectural diagram detailing the steps in creating the categorical lexicons and using them to get the PMID counts from literature. DEG stands for Differentially Expressed Gene while DE miRNA stands for Differentially Expressed miRNA.</p

FigShare

Bubble chart of Cancer-Gene associations from literature.

Author: Anney Che (491890)
Brian T. Luke (171128)
F. Pascal Girard (491891)
Girish Venkataraman (491889)
Mohamad Khouja (491887)
Robert M. Stephens (17045)
Stephen Repetski (491888)
Uma S. Mudunuri (463683)
Publication venue
Publication date
Field of study

A bubble chart representation with cancer terms on the x-axis and genes on the y-axis. The size of the bubble is directly proportional to the number of literature articles where the cancer and gene terms co-occur.</p

FigShare

Cancer term occurrences in the literature.

Author: Anney Che (491890)
Brian T. Luke (171128)
F. Pascal Girard (491891)
Girish Venkataraman (491889)
Mohamad Khouja (491887)
Robert M. Stephens (17045)
Stephen Repetski (491888)
Uma S. Mudunuri (463683)
Publication venue
Publication date
Field of study

A bar chart representation with cancer terms on the y-axis and publication counts on the x-axis. Only the cancer terms with high literature occurrences are shown.</p

FigShare

Table3.DOCX

Author: Anney
Autism Genome Project
Baranzini
Box
Che
Chung
Chung
Cordell
Das
Emily
Ionita-Laza
Ionita-Laza
Li
Li
Li
Lin
Liu
Liu
Ma
Madsen
Manolio
Matsunami
McCarthy
Pinto
Purcell
Satterthwaite
Schaffner
Urbanowicz
von Mering
Wan
Wang
Welch
Wiens
Wu
Wu
Yates
Zaykin
Zhao
Zheng
Publication venue: 'Frontiers Media SA'
Publication date
Field of study

Crossref